On Deconstructing Ensemble Models
نویسنده
چکیده
Consider a prediction problem with correlated predictors. In such a case, the best model specification, that is, the best subset of active predictors, can be ambiguous. In spite of this ambiguity, a forecast that informs a high-stakes decision warrants a compact, informative description of the model that produces it. For forecasts based on ensemble models, such descriptions are not straightforward. Our example considers searches on google.com; each observation consists of one experiment changing the details in how the system responds to user queries. Our predictors measure the changes, relative to a contemporaneous control, of short-term metrics. Our response measures a shift in user behavior observable only after a longer term, also calculated relative to the control. Our ensemble of models comes from a spike-and-slab regression. We represent each ensemble — each model — by its specification, a vector of booleans denoting the active predictors. For each such model we calculate its goodness of fit statistic. Applying logic regression to predict goodness of fit as a function of the specification booleans, we obtain a metamodel. As a weighted sum of boolean expressions, the metamodel provides a description that is both parsimonious and illuminating. key words: collinearity, factor analysis, logic regression, model deconstruction, spike-and-slab regression, variance function
منابع مشابه
Presentation of new ensemble method of Bayesian and logistic regression models in landslide susceptibility assessment in the Khalkhal Township
The aim of current research is to assess of landslide susceptibility in the Khalkhal Township, southern Ardabil using an ensemble and new method namely Bayesian and logistic regression (BT-LR) models. At first, landslide inventory map was prepared and then effective factors on landslide occurrence were identified. These factors are slope degree, plan curvature, slope aspect, elevation, landuse,...
متن کاملEnsemble of M5 Model Tree Based Modelling of Sodium Adsorption Ratio
This work reports the results of four ensemble approaches with the M5 model tree as the base regression model to anticipate Sodium Adsorption Ratio (SAR). Ensemble methods that combine the output of multiple regression models have been found to be more accurate than any of the individual models making up the ensemble. In this study additive boosting, bagging, rotation forest and random subspace...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملzoning of flood hazard in Nowshahr city using machine learning models
The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...
متن کاملMonitoring of Regional Low-Flow Frequency Using Artificial Neural Networks
Ecosystem of arid and semiarid regions of the world, much of the country lies in the sensitive and fragile environment Canvases are that factors in the extinction and destruction are easily destroyed in this paper, artificial neural networks (ANNs) are introduced to obtain improved regional low-flow estimates at ungauged sites. A multilayer perceptron (MLP) network is used to identify the funct...
متن کامل